ARKref: a rule-based coreference resolution system

نویسندگان

  • Brendan T. O'Connor
  • Michael Heilman
چکیده

ARKref is a tool for noun phrase coreference that is based on the systemdescribed byHaghighi and Klein (2009) (which was never publicly released). It was originally written in 2009. At the time of writing, the last released version was in March 2011. This document describes that version, which is open-source and publicly available at http://www.ark.cs.cmu.edu/ARKref.1 ARKref is a deterministic, rule-based system that uses syntactic information from a constituent parser, and semantic information from an entity recognition component, to constrain the set of possible mention candidates (i.e., noun phrases) that could be antecedents for a given mention. It encodes syntactic constraints such as the fact that the noun phrases in predicative nominative constructions corefer (e.g., John was the teacher.), as well as semantic constraints such as the fact that he cannot corefer with a noun labeled as a location. After filtering candidates with these constraints, it selects as the antecedent the candidate noun phrase with the shortest (cross-sentence) tree distance from the target. Antecedent decisions are aggregated with a transitive closure to create the final entity graph. ARKref belongs to a family of rule-based coreference systems that use rich syntactic and semantic information to make antecedent selection decisions. Besides Haghighi and Klein, current work in this vein includes Lee et al. (2013), which was one of the best performing systems in a recent CoNLL shared task. The following example provides an illustration of ARKref’s output, in which brackets denote the extent of noun phrases and indices denote the entity to which each noun phrase refers. This example emphasizes the syntactic selection criteria:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Corefrence resolution with deep learning in the Persian Labnguage

Coreference resolution is an advanced issue in natural language processing. Nowadays, due to the extension of social networks, TV channels, news agencies, the Internet, etc. in human life, reading all the contents, analyzing them, and finding a relation between them require time and cost. In the present era, text analysis is performed using various natural language processing techniques, one ...

متن کامل

Incorporating Rule-based and Statistic-based Techniques for Coreference Resolution

This paper describes a coreference resolution system for CONLL 2012 shared task developed by HLT_HITSZ group, which incorporates rule-based and statistic-based techniques. The system performs coreference resolution through the mention pair classification and linking. For each detected mention pairs in the text, a Decision Tree (DT) based binary classifier is applied to determine whether they fo...

متن کامل

An Exercise in Reuse of Resources: Adapting General Discourse Coreference Resolution for Detecting Lexical Chains in Patent Documentation

The Stanford Coreference Resolution System (StCR) is a multi-pass, rule-based system that scored best in the CoNLL 2011 shared task on general discourse coreference resolution. We describe how the StCR has been adapted to the specific domain of patents and give some cues on how it can be adapted to other domains. We present a linguistic analysis of the patent domain and how we were able to adap...

متن کامل

Combining Syntactic and Semantic Features by SVM for Unrestricted Coreference Resolution

The paper presents a system for the CoNLL2011 share task of coreference resolution. The system composes of two components: one for mentions detection and another one for their coreference resolution. For mentions detection, we adopted a number of heuristic rules from syntactic parse tree perspective. For coreference resolution, we apply SVM by exploiting multiple syntactic and semantic features...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1310.1975  شماره 

صفحات  -

تاریخ انتشار 2013